2023-03-09

Agenda

  • The DAUF project - “Datadriven Analys och Uppföljning av KTHs Forskning”

  • A demo of the “KTH Publication Analysis app” with projects data from CASE

  • Overview of available data sources for projects at KTH

  • Your questions and feedback

Background and progress

DAUF - background and context

Trend towards better overview of research outputs as well as integration of systems.

Example from Stanford

Stanford RIALTO project (Youtube)

Progress - What is new during Q1 2023?

  • KTH Publication Analysis app
    • allowing exploration of co-publication and collaboration at KTH
    • for ad-hoc research groups
    • rolling data
    • CASE projects data
  • Mobilizations of projects data from other sources
    • CASE
    • SweCris
    • CORDIS (EU projects)
    • Other sources

Demo of KTH Publication Analysis app

KTH Publication Analysis app

The goal of this app is to help answering some common questions, for example:

  • Who are the main collaborating partners of division X?
  • How large is the publication output of department Y over time, in copublication with German organisations?
  • Which senior staff at CBH is collaborating with MIT, and what is the bibliometric performance of this research compared to CBH baseline?
  • What is the publication output and project count of the staff related to project X (ad-hoc group of KTH staff)

The idea is that the analysis app will complement the Annual bibliometric monitoring (ABM).

KTH Publication Analysis app

Example based on TECoSA

TECoSA Centre running since 2020 used as example, based on 9 PIs.

Some summary stats:

  • 462 publications 2010-2021

  • 137 publications 2020-2021 (e.g from the TECoSA-period)

  • 28 projecs in CASE

  • 14 projects 2020-2021 (seems to be more in CASE, but some with missing data on duration)

  • Info on funders and SDG-goals on projects

  • Picture of collaborations, through co-publication

Data for projects at KTH

DAUF and data about projects

  • DAUF has been building bottom-up data consolidation and analytics for KTH, making links between Researchers <-> Outpus <-> Projects

  • “Which projects are active? Which researchers are involved? Grant sizes?”

  • What is a project, really? It is an amorphous concept:

    • Externally financed research activity, ie funders and funding exists?
    • Internal ongoing “project work” - internal (or not financed at all)?
    • A call which has (not yet?) been awarded financing?
    • A way to group publications and their authors together?
  • There is no single “source of truth” for all projects at KTH

    • Data requirements depend on who asks the questions and for what purpose

Data sources for projects

  • Overview of available data sources
    • Descriptions of data sources
    • Data flows and access
    • Combining data sources for KTH projects
Data-driven flowchart. Data wrangling, exploratory data analysis, and data enrichment

Data-driven flowchart. Data wrangling, exploratory data analysis, and data enrichment

CASE

  • <Overview here - nr of records, update frequency, fields available, keys>
Pareto chart for missing values (counts and proportions) in Case

Pareto chart for missing values (counts and proportions) in Case

SweCris

  • <Overview here - nr of records, update frequency, fields available, keys>

CORDIS

  • <Overview here - nr of records, update frequency, fields available, keys>

Other sources

  • <Overview here - nr of records, update frequency, fields available, keys>

Combining data sources for projects

  • <Overlap - Unions and intersections for different data sources>
Entity/Relationship (ER) diagram. Shared identifiers among specific relational databases (RDB).

Entity/Relationship (ER) diagram. Shared identifiers among specific relational databases (RDB).

Selection (Urval)

Future work and directions

  • <Product Owners, Future Outlook>
  • Subject area analysis
    • UKÄ/SCB categories
    • Topic clusters and keywords
    • Journal classifications

Questions and feedback

Questions and Answers

Please provide your input!

  • Do you think what was presented in the demonstration meets your needs for analytical exploration?
  • Would additional reference datasets be needed?
  • Would additional ways to slice and dice the data along other dimensions be useful?
  • Would parameterized interactive reports be useful in addition to the raw data and the app?
  • Are metrics related to author sequence/order useful?
  • Other questions from the Zoom chat
  • Suggestions and comments

If you prefer to provide written feedback, please use the following jamboard

Thank you for attending!